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Abstract — RF sensor networks are wireless networks that can 
localize and track people (or targets) without needing them to 
carry or wear any electronic device. They use the change in the 
received signal strength (RSS) of the links due to the movements 
of people to infer their locations. In this paper, we consider 
real-time multiple target tracking with RF sensor networks. 
We perform radio tomographic imaging (RTI), which generates 
images of the change in the propagation field, as if they were 
frames of a video. Our RTI method uses RSS measurements 
on multiple frequency channels on each link, combining them 
with a fade level-based weighted average. We describe methods 
to adapt machine vision methods to the peculiarities of RTI 
to enable real time multiple target tracking. Several tests are 
performed in an open environment, a one-bedroom apartment, 
and a cluttered office environment. The results demonstrate that 
the system is capable of accurately tracking in real-time up 
to 4 targets in cluttered indoor environments, even when their 
trajectories intersect multiple times, without mis-estimating the 
number of targets found in the monitored area. The highest 
average tracking error measured in the tests is 0.45 m with two 
targets, 0.46 m with three targets, and 0.55 m with four targets. 

Index Terms — radio tomography, multiple target tracking, 
device-free localization, wireless sensor networks 



I. Introduction 

RADIO frequency (RF) sensor networks are wireless 
networks that monitor the changes in the received signal 
strength (RSS) of the links in order to localize and track 
people, without requiring them to carry or wear any radio 
device. In these systems, the RSS measurements can be 
processed to form images of the changes in the propagation 
field of the monitored area in presence of moving people and 
objects - a process named radio tomographic imaging (RTI) 
ifTTl . Unlike previous works Q, (2), 0, (4) which have focused 
on single target localization and tracking, this work aims at 
contributing to the growing field of device-free localization 
(DFL) through RF sensor networks d, (5) by presenting a 
system capable of accurately tracking in real-time multiple 
people (or targets) moving in real-world indoor environments 
where low-power transceivers are deployed. 

The potential applications of RF sensor networks are many, 
including smart buildings and perimeter surveillance, ambi- 
ent assisted living and residential monitoring (6), breathing 
detection Q, and security and rescue operations. Compared 
to other sensing technologies applied in indoor DFL, such as 
infrared, ultrasonic range finders |8], ultra- wideband (UWB) 
radios and video cameras, RF sensor networks provide several 
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advantages: they work in the dark and can penetrate smoke and 
walls; they are less invasive in domestic environments than 
video camera networks; they are significantly less expensive 
than UWB transceivers; their installation and maintenance 
time is minimal. 

People moving in an area where wireless transceivers are 
deployed affect the propagation of the radio signals by shad- 
owing, reflecting, diffracting or scattering a subset of their 
multipath components (5), 0, fTOlL ifTTl . In open and unclut- 
tered environments, where line-of-sight (LoS) communication 
among the transceivers is predominant, a person obstructing 
a link line will generally cause attenuation of the radio 
signal. This phenomenon has been succesfully applied for 
DFL in several works (U, Q, |3), (4). However, in cluttered 
environments, where multipath propagation is predominant, 
the change in RSS due to the presence of a human body 
becomes more unpredictable. As the link line is obstructed, 
the RSS can also remain constant or increase flTTl . In addition, 
due to the multipath propagation of the radio signals, people 
can affect the RSS also when located far away from the link 
line E2. 

Most of the research presented so far in the area of DFL 
with RF sensor networks has considered the situation in which 
only one target had to be located and tracked 0, 10- 0, 
|4|. However, the real-world scenarios in which these systems 
are to be used often require the localization and tracking of 
multiple targets. Moreover, the DFL system should be able 
a) to perform the localization and tracking tasks in real-time 
even when the targets trajectories intersect, and b) to correctly 
estimate the number or targets to be tracked as people enter 
and exit the monitored area. In this work, we tackle all of 
these challenges. In forming the RTI images, we apply a novel 
approach that weights the RSS measurements based on the 
fade level 031 of the frequency channels on which the RSS 
was measured. We adapt machine vision methods to RTI and 
use the new methods to process the RTI images in real-time 
to detect and track the blobs corresponding to real targets. 
Multiple target tracking with RF sensor networks is made more 
challenging in that in RTI the targets have to be modeled as 
points, consisting only of position, velocity and acceleration, 
neither having physical length or other features such as shape, 
color, or size. Although the blob size and shape in RTI 
are loosely related to a person's size, the blob dramatically 
changes depending on a person's position in the monitored 
area and on the positions of other people in the area. Thus, the 
noise in the blob shape overwhelms any attempt to determine 
a person's features. We invite the reader to view an RTI video 
at ifTH as an example of the capabilities and limitations of 
the imaging modality. In addition, due to measurement noise 
and the simultaneous presence of multiple targets, objects 
and obstructions in the monitored area, spurious blobs (not 



2 



corresponding to real targets) can appear in the image. For 
the same reasons, blobs corresponding to real targets can 
temporarily disappear from the image. These factors increase 
the difficulty of multiple target tracking, especially with inter- 
secting trajectories. 

We evalute the performance of the multi-target tracking 
system in three different indoor environments, i.e. an open 
environment with no obstructions nor objects, a one-bedroom 
apartment with internal walls, furniture and various objects, 
and a heavily cluttered office environment. The few multi- 
target tracking methods developed for RSS-based DFL are 
either non-real time fT3lL lf!3lL track only two or three people 
E2L lH3lL lfT6l . ifTTl . assume that the number of targets is 
fixed and known a priori fT3lL fTTlL fT8lL or do not attempt 
to track targets with intersecting trajectories fT3lL fT5lL lfT6l , 
ifTTlL fT8ll . In an experimental test with four targets having 
separated trajectories, the method in [19] achieves as low as 
0.63 m average accuracy, consuming 7.6 seconds to process 
an RTI image. Our method achieves 0.55 m average accuracy 
in an experimental test with four targets having intersecting 
trajectories, consuming 13.3 milliseconds to process an RTI 
image. To the best of our knowledge, we are the first to 
demonstrate that RF sensor networks can be used in real- world 
indoor environments to accurately track in real-time up to four 
targets even when their trajectories intersect multiple times. 
The highest average tracking error measured in the tests is 
0.45 m with two targets, 0.46 m with three targets, and 0.55 
m with four targets. Videos showing the performance of the 
multi-target tracking system during the tests described in this 
paper can be found at fl4l . 

The remainder of the paper is organized as follows. We 
present the methods used to form radio tomographic images 
and to track multiple targets in Section [II] and [TTTJ respectively. 
The experiments carried out are described in Section IV and 
the results are listed and discussed in Section [V] The related 
work is described in Section [VU Conclusions are drawn in 
Section Ivnl 



II. Multi-channel RTI 

In this section, we describe how the RSS measurements 
collected on multiple channels are processed in real-time to 
form radio tomographic images. We deploy R sensors at 
positions {z r } r =i v . At time instant k, we measure the 
RSS n jC (fc) in dBm of link I on channel c G {1, K}. Our 
objective is to estimate the change in the propagation field of 
the monitored area, x, from the RSS measurements collected 
on all the links of the network. 

A. Fade level 

In obstructed environments, radio signals propagate from the 
transmitter to the receiver via multiple paths. At the receiver, a 
phasor sum of the waves impinging on the antenna determine 
the RSS. Depending on the relative phase of each wave, the 
waves may add constructively or destructively. As a wave's 
phase is a function of the center frequency and path length, 
the RSS is a function of the center frequency and position of 
the communicating devices, an effect called multipath fading. 



The relation between steady-state, narrow-band fading and 
the temporal fading statistics of the RSS due to human 
movement has been described in [ 13 ], where the authors define 
the concept of fade level, a continuum between two extremes: 
deep fade and anti-fade. For a link in a deep fade, when 
the link line is obstructed, the RSS, on average, increases. In 
addition, deep fade links show changes of the RSS even when 
the person is at some positions far away from the link line. 
On the contrary, for a link in an anti-fade, when the link line 
is obstructed, the RSS, on average, decreases. Moreover, anti- 
fade links show changes of the RSS only when the person is in 
the close proximity of the link line. Due to the limited size and 
predictable shape of their sensitivity area, the anti-fade links 
are the most informative for DFL. In lfT2l . l6lk the channels 
used to form RTI images were sorted based on their fade level, 
and the RSS measurements of most anti-fade channels of each 
link were selected. In this work, we use channel diversity and 
propose a novel approach to consider the RSS measurements 
of all the channels and weight them based on their fade level. 

During an initial calibration period performed in static 
conditions, i.e., when the monitored area is empty, we measure 
the average RSS of each link I on each different frequency 
channel c, fi c , which can be modeled as: 



(1) 



where P c is the actual transmit power, in dBm, on channel c 
and G/ ?c < is the path gain, in dBm, of link I on channel c. 
For th e wireless sensors used in the experiments (see Section 
IV- A), the actual transmit power P c is different from the 
nominal one and, most importantly for estimating the fade 
level, is a function of the frequency channel c, in part due to 
the difficulty in antenna impedance matching across a wide 
frequency band l20l . 

The fade level of channel c for link I is estimated as 
the difference between the path gain measured for channel 
c and the minimum of the path gains measured for the used 
frequency channels: 
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(2) 



Thus, for the same link I, channel c\ is in a deeper fade than 
channel C2 if Fi )C i < F/ ?C 2. Note that F^ c > 0, and that 
F\ c = for one channel c on each link. 



B. Measurement model 

The RSS measurements collected on different channels are 
weighted based on the fade level defined in ([2]), bearing 
in mind that anti-fade channels provide measurements more 
informative for localization. The weighted average change in 
RSS of link I at time k is computed as: 



m(k) 



■^F ZjC .|An |C (fc)|, 



(3) 
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where Arj jC (fc) is the difference between the RSS of link I on 
channel c measured at time k, n jC (fc), and the reference RSS, 
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C. Image estimation 

The change in RSS is assumed to be a spatial integral of the 
propagation field of the monitored area. When the propagation 
field is discretized, some voxels affect the RSS of a specific 
link, whereas others do not. Thus, the change in RSS of each 
link is assumed to be a linear combination of the change 
caused by each voxel: 



N 



(4) 



where Xj is the change in RSS caused by voxel j, wij the 
weight of voxel j for link I, N the number of voxels in the 
discretized image and n\ the noise of link I. The weight w 
indicates how each voxel of the image affects each link. For 
this, we use an ellipse model (U, (TTJ, lfT2l in which the 
transmitter and receiver are located at the foci of the ellipse. 
According to this model, a voxel j at position Vj that is 
located within the ellipse of link I has its weight wij set to 
a constant, which is inversely proportional to the area of the 
ellipse. Otherwise its weight is set to zero, as follows: 



w ij 



if d% 
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<di + \ 



otherwise 



(5) 



where djj 



z h x -Vj|| and 4/ = \\z lrx -Vj\\, di = \\l Ux - 
z/ rx ||, A\ is the area of the ellipse, and A is its excess path 
length, i.e., the parameter defining the width of the ellipse. 

When all the links of the wireless network are considered, 
the change in the propagation field of the monitored area is: 



Wx 



(6) 



in which y and n are M x 1 vectors representing the weighted 
RSS change and noise of the M wireless links, and x is the 
TV x 1 change in the propagation field to be estimated, where 
each element Xj represents change due to the presence of a 
person in voxel j. The linear model for the change in the 
propagation field is based on the correlated shadowing models 
in EB, (221 and on the work in Q. 

Since estimating the image vector x from the links' mea- 
surements y is an ill-posed inverse problem, regularization 
is required. In this work, we use a regularized least- squares 
approach I2TI l23l: 

x = n y , (7) 



where: 



n = (w T w + c-V^) w 7 



(8) 



in which g^ is the regularization parameter. The a priori 
co variance matrix C x is calculated by using an exponential 
spatial decay: 



-llv.-V.H/Sc 



(9) 



where g 2 x is the variance of voxel measurements, and S c 
is the voxels' correlation distance. The linear transformation 
II is computed only once before real-time operation. The 
calculation of x in ^ requires MN operations and can be 
performed in real-time. 



D. Image denoising 

An RTI image representing the situation in which n targets 
are located in different regions of the monitored area should 
ideally show n blobs, i.e., regions in which the voxels have 
an intensity much higher than the intensity of the surrounding 
voxels. However, due to noise in the RSS measurements 
and the simultaneous presence of multiple individuals and 
obstructions, the RTI images are often noisy, showing multiple 
spurious blobs of small size which do not correspond to actual 
targets. For this reasons, a Gaussian filter is applied on the 
RTI image. This operation has the effect of reducing the 
image's high spatial frequency components, filtering out small 
spurious blobs while simultaneously keeping those larger blobs 
corresponding to actual targets l24ll . 

The filtering is obtained by convolving the RTI image with 
an isotropic Gaussian kernel: 



G(x,y) = 



1 x z +y z 



(10) 



where gq = 1 m is the standard deviation of the Gaussian 
kernel which indicates how much the image is filtered (or 
blurred). Since the estimated RTI image x is stored as a set 
of discrete voxels, we need to produce a discrete approxi- 
mation of the Gaussian kernel G to be able to perform the 
convolution. Thus, the kernel is truncated after \jg/p + 0.5J 
voxels, where tq = 0.75 m is the radius of the kernel. The 
low-pass filtered RTI image xg is then calculated as: 



xg = x * G, 



(11) 



where * represents the convolution operator. As a result, each 
voxel of the filtered RTI image is a weighted average of the 
voxel's neighborhood, with the central pixels weighted more 
than the peripheral ones. The values of gq and tq are chosen 
so as to fit the size of the blobs corresponding to the targets. 

III. Multiple Target Tracking 

The methods described in the previous section form radio 
tomographic images in real-time, i.e., pictures of the change 
in the propagation field of the monitored area caused by the 
presence and movements of the people found in it. These 
images can be considered as frames of a video showing the 
movements of multiple targets. In this section, we describe the 
methods used to process the RTI images and track multiple 
targets in real-time. Our objective is to correctly detect the 
entrance and exit of the targets and estimate their position v t 
by assigning to each of them a voxel of the estimated RTI 
image. 

A. Thresholding 

After denoising the image, an additional filter is required in 
order to reduce the size of the set of voxels that go through 



the clustering process (see Section |III-B| ) and to preserve only 
those in the regions occupied by the targets. To this purpose, 
a dynamic threshold T t is set as follows. 

During the calibration period, i.e., when the monitored area 
is empty, we calculate the average maximum intensity of the 
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formed RTI images, I e . When the monitored area is empty, the 
threshold T t is set to 21 e to filter the voxels having very low 
intensity. On the other hand, when targets are being tracked, 
we derive the minimum intensity, I m i n , of the targets t in the 
set of estimated targets T = {ti, . . . , t\T\} as: 



Imin = mm[± G ] t 
teT 



(12) 



We then low-pass filter J m i n : 

I f (k) = a f I f (k - 1) + (1 - a f )I min (k), (13) 
where otj — 0.9. Thus, the threshold T t is set as follows: 
ipifik) if|T(fe)|>0 



Ttik) 



\2J e 



otherwise 



(14) 



where \T(k)\ is the number of targets found in the monitored 
area at time instant k and fj < 1. In the experiments, we could 
not observe any significant performance change for values of 
fj in the range [0.75,0.9]. The on-line updating of T t ensures 
that the voxels surrounding the tracks are not filtered and will 
go through the clustering process. 

A vector mask M t of size A^ x 1 is created as follows: 



[Mt] n 



1 if[x G ] n >T £ 
otherwise 



(15) 



The denoised and filtered RTI image xj is finally calculated 
as the element-wise product of the denoised image xg and 
M t : 

x/ =x G AM t . (16) 

B. Clustering 

In the clustering phase, voxels are assigned to each blob 
found in the image. Since we do not make any a priori 
assumption on the number of targets to be tracked (new targets 
can enter the monitored area at any time as tracked targets 
can leave it, and spurious blobs can also appear), clustering 
algorithms, such as k-means 1251 , for which the number of 
clusters must be known a priori can not be applied. For this 
reason, we use an hierarchical agglomerative clustering (HAC) 
algorithm l26l . 

We define V as the set of voxels that are not filtered in the 
thresholding phase (see Section III-A), i.e., V = { j : [x.f]j > 



T t }. Each voxel j G V has coordinates Vj — [xj,yj] in the XY 
plane. In the HAC algorithm, each voxel is initially considered 
as an independent cluster. At each iteration, the two closest 
clusters are merged. The distance between two clusters, S a C 
V and Sb C V, is measured with the average linkage distance 
J, i.e., the average of the Euclidean distances between all the 
voxels assigned to the two clusters: 



d(S a , Sb) 



j b es b W V 3a 



(17) 



\Sa\\S b \ 

The iterations terminate when the minimum of the average 
linkage distances among the clusters is larger than a threshold 
T c , which determines the average size of the formed clusters, 
and ultimately their number, (i.e., several small clusters for 
low values of T c , few larger clusters for high values of T c ). 



We normalize the intensity of the image in the range [0,1], 
so that the normalized intensity Ij of each voxel j G V: 

h = — ^W- as) 

For each formed cluster Si, the voxel hi G Si having the 
maximum normalized intensity is selected as the cluster head: 

hi = argmaxij. (19) 

We define the set of original cluster heads H as the set of 
voxels hi for all i. 

C. Cluster heads selection 

Due to the 3D shape of the blobs found in the RTI image 
(typically round in open environments, having more distorted 
shapes in obstructed areas and when targets trajectories inter- 
sect), the HAC algorithm can form several cluster heads. In 
this case, we want to decrease the number of elements in H 
in order to reduce the complexity of the observations-targets 
association problem (see Section III-D2| ), and simultaneously 
keep, for every occupied region, only the cluster heads with 
the highest intensities, which are more likely to correspond to 
the real targets found in the monitored area. As a result of the 
selection process, a new set Hi of cluster heads having 
higher intensities is formed. 

In each indoor environment where the sensors are deployed, 
we define lZ e C V as the set of voxels included in the 
entrance/exit region, i.e., the region the targets must go through 
in order to enter and exit the monitored area. This region can 
be limited to a specific part of the monitored area (as in the 



apartment described in Section IV-B2), or can cover the entire 
region along the perimeter of the monitored area (as in the 



open and office environments in Section IV-B1 and IV-B3). 
All the cluster heads that are located within 1Z e are included 
in Hi, regardless of their intensity. 

The remaining cluster heads are selected based on their 
proximity to the targets being tracked and their intensity. As 
first step, gating is applied on the cluster heads in H. At time 
k, for each cluster head h at position v^, we define Xh as: 



Xh = {t : Hvfc - v t || < rj, 



(20) 



where v t is the position of the cluster head associated to target 
t at time k — 1. The parameter r t represents the radius of the 
gating area centered at v t . Initially, r t = r (see Table [n]). 
The value of r t is modified whenever the trajectory of target 



t intersects the trajectory of another track (see Section |III-F| ). 
The gating area has to accommodate for the motion variance 
of the targets, indicating how fast the targets can move, and the 
typical noise of RTI images, (i.e., spurious and disappearing 
blobs, and blobs merging and splitting in case of intersecting 
trajectories). 

Based on the results of the gating process, a cluster head h 
is included in Hi if a) \\xh\\ > 1> i^., if there is at least one 
target t whose gating area includes h, and h) the normalized 
intensity of h is larger than a threshold T^, defined as: 



T h = p mm I t , 

t^Xh 



(21) 



where p = 0.8. 
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D. Target tracking 

1) Tracks confirmation and deletion: The selected cluster 
heads in Hi are considered for updating the existing tracks 
and for potentially initiating new tracks. In machine vision, 
track confirmation and deletion is usually determined by rules 
1271 . In our case, the rules have to deal with RTI images that 
show new blobs when people enter the monitored area and 
stop showing blobs when people exit the monitored area. RTI 
images can show spurious blobs not corresponding to people, 
while blobs corresponding to real targets can temporarily 
disappear, as it is shown in the videos in lfT4l . Due to this, 
a trade-off exists between the reactivity of the system to the 
entrance and exit of targets and the sensitivity to the noise in 
the images. 

At each frame, the data association methods presented in 



Section III-D2 assign some of the cluster heads in Hi to the 
already existing tracks. Of the cluster heads which are not 
assigned, only those that are located in 1Z e are considered 
as new candidate tracks, i.e., tracks potentially corresponding 
to new targets, whereas the ones located outside of 1Z e are 
considered as noise and are discarded. A candidate track 
becomes a confirmed track only if it has been assigned a 



cluster head (see Section III-D2[ ) at least n app times in the 



last m frames (n app < m). The value of n app makes the 
system more or less reactive to the entrance of new targets. 
By using this rule, the system confirms the entrance of a 
new target only after having multiple confirmations of its 
presence. This introduces a small latency between when a new 
target enters the monitored area and the moment the system 
acknowledges it, as shown in Figure [3] However, the rule 
makes the system more robust to the appearances of spurious 
blobs in 7Z e . Since the DFL system used in the experiments 
produces approximately 10 RTI images (or frames) per second, 
we set m = 10 so to consider the appearances of the tracks in 
a one second time interval before confirming their presence. 

On the other hand, a track (whether confirmed or candidate) 
is deleted after it has not been assigned a cluster head in 
the last ridel consecutive frames. Also in this case, the DFL 
system deletes the existence of the tracks with a small delay 
compared to reality. However, the rule prevents the system 
from incorrectly deleting tracks that have not been associated 
a cluster head for a few consecutive frames due to noise in 
the RTI image. 

2) Target tracking: The problem of tracking multiple tar- 
gets can be formulated as a data assignment problem (DAP), 
in which at each frame a set of new observations has to be 
assigned to the set of existing targets, as shown in Figure 
[T] In our DFL system, at each new formed RTI image the 
set of selected cluster heads Hi = {hi, . . . , h\ Hl \} has to be 
assigned to the set of estimated targets T = {ti, . . . , £|T|}- 
The solution of this problem consists in finding the optimal 
permutation cj> of the set Hi, where the permutation matrix 6 
is defined as: 



1 if t = 7r(h) 
otherwise 



Observations! 
targets 




Fig. 1. DAP in RTI. The dashed circles represent the gating areas centered 
at the tracks. In this case, the observation-tracks assignments are: o\ — £2, 
03 — £1, 04 —£3 and 05 —£4. Observation 02 is not assigned and is considered 
for starting a new track. The radius of the gating areas centered at £3 and £4 
is larger as these two targets are intersecting. 



Based on the outcome of the gating process (see Section |ITI-C| ) 
an association matrix ft is defined as follows: 



[«]m 



otherwise 



(23) 



(22) 



The elements set to 00 represent unfeasible observation-target 
assignments. When is interpreted as a cost matrix, the 
DAP becomes the problem of selecting observation- target 
assignments that minimize the total cost. 

3) Prior work: The DAP in multiple target tracking has 
received significant attention from the research community, 
and several methods, including particle filters (PF, l28l ), prob- 
abilistic and joint probabilistic data association methods (PDA 
and JPDA, l29l ) and multiple hypothesis tracking (MHT, 
l30l ). have been proposed and analyzed OTTl . However, these 
methods present some limitations that make them not suitable 
to the requirements of our DFL system, which are a) no 
a priori assumptions on the number of targets and b) real- 
timeliness. 

The PDA and JPDA approaches assume that the number of 
targets is known and constant. Moreover, the PF and MHT 
approaches are methods that struggle to meet strict real-time 
requirements: the PF needs a high number of particles to obtain 
accurate tracking, at the expense of a high computational 
complexity and long processing time fT3lL 11711 . Moreover, 
whenever the number of targets to be tracked increases, the 
PF needs a higher number of particles to maintain the same 
accuracy, making the processing time even longer. In MHT, 
each feasible target-observation association is considered as an 
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hypothesis with a specific probability. At each new RTI image, 
each hypothesis is expanded into a set of new hypothesis 
having specific probabilities, so that a tree of hypothesis 
is incrementally generated. Each hypothesis is expressed as 
a permutation matrix, and the method keeps a probability 
distribution over the space of all permutation matrices. After 
several frames have been analyzed, the tracks having the 
highest probabilities are selected. However, the number of 
hypothesis grows exponentially with the number of targets 
in T, making this method computationally intensive l32l . 
Furthermore, the MHT approach is a batch method, which 
postpones the DAP solution until more clear information is 
available. On the contrary, we aim at using a recursive method 
that estimates the targets positions at time k based on the 
observations available at time k and the estimates of the targets 
positions at time k — 1. 

4) Nearest neighbor methods: In this work, two com- 
putationally efficient versions of the nearest neighbor (NN) 
approach, namely the global nearest neighbor (GNN) and the 
greedy (or suboptimal) nearest neighbor (SNN), are tailored 
for the characteristics of RTI. Despite its simplicity, the 
NN approach has demonstrated high tracking performance in 
scenarios characterized by noisy measurements, such as radar 
and sonar applications (33), OH. In GNN and SNN, the tracks 
are updated at each frame. A track can be associated to only 
one observation, and in turn one observation can be associated 
only to one track. 

The GNN method applies the Hungarian algorithm l35l . 
(36), which is capable of finding the optimal solution in a 
polynomial time 0(n 3 ), in which n = min(|T|, \rii\)' The 
SNN method applies a greedy approach to the selection of 
the observation-target assignments, which in 0(n) time is not 
guaranteed to find the optimal solution but requires fewer 
computations than the GNN method (especially when the 
number of feasible assignments is large). For the greedy SNN 
method, the upper bound of the total cost associated to the 
selection of the observation-target assignments is equal to 
twice the optimal cost guaranteed by the GNN method. The 
observations that are not assigned to any target are considered 



TABLE I 

Image reconstruction parameters 



for starting new tracks (see Section III-D1). On the other 
hand, the tracks that are not assigned an observation are still 
predicted by using the Kalman filter (KF). 



E. Kalman filter tracking 

Whenever a candidate track t is confirmed, a track- specific 
KF l37lL l38l is initialized and recursively applied to track 
its movements. The system runs the KFs in parallel. Each KF 
estimates the new state, i.e., position, of a track by taking into 
consideration its previous state and the new observation, i.e., 
cluster head, associated to it (see Section |TlI-D2| ). We assume 
that the targets to be tracked move as a Brownian process and 
that the measurement noise is Gaussian. The KF smoothes the 
trajectories of the targets and reduces their sudden changes of 
direction. At this purpose, it is particularly useful in the case 
of noisy RTI images. 



Parameter 


Value 


Description 


V 


0.1524 


Pixel width [m] 


A 


0.02 


Ellipse excess path length[m] 




0.2236 


voxels standard deviation [dB] 


& N 


1 


noise standard deviation [dB] 


Sc 


3 


correlation coefficient 



TABLE II 

Multiple target tracking parameters (default values) 



Parameter 


Value 


Description 


P 


0.80 


Parameter used to set Tt in Jl4J 
Empty area intensity threshola 
Voxels clustering threshold [m] 


T e 

T c 


0.01 
1.25 


Ti 
r 


2 
2 


Intersecting trajectories threshold [m] 
Radius of the gating area [m] 


r G 


0.75 


Radius of the Gaussian kernel [m] 




1 


Std. deviation of the Gaussian kernel [m] 



F. Handling intersecting trajectories 

In this work, we tackle the problem of tracking targets 
having intersecting trajectories. In RTI, this situation man- 
ifests itself in two (or more) blobs slowly merging into a 
single one and then splitting again after some frames. In 
cluttered indoor environments, when two targets approach 
each other, the formed RTI images become very noisy due 
to the unpredictable overlap of the multiple propagation paths 
modified by each target. For this reason, whenever the distance 
between a target t and any other target drops below Ti, r t in 



(20) is doubled. With this adjustment, the motion variance of 
the targets is increased. Since in intersecting situations the 
merging- splitting blobs can disappear for several consecutive 
frames, increasing the radius of the gating area avoids losing 
track of targets. 

IV. Experimental Setup 

In this section, we describe the hardware and communica- 
tion protocol used in the experiments and the environments 
in which we carried out the tests. The values of the image 
reconstruction parameters and of the multiple target tracking 
methods are listed in Table |l| and [TTJ respectively. 

A. Hardware and communication protocol 

The sensors used in all the experiments are TI CC2531 
USB dongle nodes, transmitting at their maximum nominal 
power, i.e. 4.5 dBm (39). For multi-channel communication, 
the sensors run a multi-channel token passing protocol, multi- 
Spin, introduced in [12]. In it, the sensors transmit in TDMA 
fashion based on their ID number. Each transmitted packet 
contains the ID number of the transmitting node and the most 
recent RSS measurements of the packets received from the 
other sensors. At the end of each communication cycle, the 
sensors switch synchronously to the next frequency channel 
found in a list pre-defined by the user. On average, the time 
interval between two consecutive transmissions is 2.9 ms. A 
sink node that overhears all the packets transmitted by the 
nodes stores the RSS measurements for processing. 



7 




(d) apartment environment (e) office environment 



Fig. 2. In (a)-(c), the setups in the three indoor environments used for the experiments. The circles represent the sensors, while the dashed lines represent 
the paths followed by the people moving in the monitored area. The light gray rectangles in (c) are the rows of desks found in the office environment. The 
dark gray squares in (a) and (c) represent a concrete pillar. In (d), an image of the apartment. In (e), an image of the office environment. 



The CC2531 nodes transmit in the 2.4 GHz ISM band in 
one of 16 selectable frequency channels, which are 5 MHz 
apart, as specified by the IEEE 802.15.4 standard l40l . The 
carrier frequency (in MHz) of channel c is: 

f c = 2405 + 5 • (c- 11), cG [11,26]. (24) 

B. Test environments 

This section describes the three different indoor environ- 
ments where the experiments were carried out. In all the 
deployment environments, multiple 802.11 b/g networks create 
interference in the 2.4 GHz band BTIl . 

1) Open environment: In an open indoor environment, i.e. 
where no obstructions or objects are present, 30 sensors are 
deployed along the perimeter of a 70m 2 area, as shown in 
Figure [2ja). The sensors are placed on podiums at a height 
of one meter from the floor. They transmit on 5 different 
channels, i.e. c G {11,15,18,22,26}. During the tests, two 
people are instructed to walk at constant speed along a pre- 
defined rectangular path. In the first test, one person enters 
the monitored area and walks along the path, followed by the 
second person after few seconds. The two people walk along 
the path in the same direction, i.e., one behind the other. In 



the second test, the two people enter the monitored area few 
seconds one after the other, but this time the second person 
walks along the path in the opposite direction to the first 
person, so that the trajectories of the two people intersect. In 
this way, we are able to measure the accuracy of the proposed 
multiple target tracking system also when the targets converge 
to and then diverge from the same point in the monitored area. 
In this environment, we assume that people could enter and 
exit the monitored area at any point along the perimeter. 

2) Apartment: We deploy 33 sensors in a one bedroom, 
58m 2 (7x8.25 m) apartment shown in Figures [2^b) andj^d). 
The sensors are attached to the walls and furniture, at approx- 
imately one meter from the floor. They transmit on 4 different 
channels, i.e. c G {15,20,25,26}. To reduce the noise of 
the RSS measurements (51, the nodes are attached so to keep 
the antenna at least 5 centimeters away from the wall. In the 
tests, two people walk along pre-defined paths. The trajectories 
intersect multiple times. We assume that the apartment has two 
specific entrance regions, located at the main door and at the 
sliding glass-door separating the balcony from the living room. 

3) Office environment: In a typical office environment, 
shown in Figures |5Jc) and |5Je), containing several metallic 
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TABLE III 

Processing time for the GNN, SNN, and MHT methods 





GNN method 


SNN method 


MHT method 


Targets 


Environment 


Intersections (#) 


max [ip\ Luis J 


rj[ip\ [msj 


max [ip\ [ms \ 


rj[ip\ [msj 


max [ip\ Luis J 


rj[ip\ [msj 


2 




NO 


15.4 


7.3 


14.5 


7.0 


16.8 


7.3 


2 


open 


YES (9) 


13.5 


7.2 


12.3 


7.0 


622.5 


134.6 


2 


apartment 


YES (4) 


19.4 


6.4 


18.2 


5.9 


712.9 


121.1 


2 


apartment 


YES (5) 


19.3 


6.5 


11.8 


5.8 


537.7 


109.2 


2 


office 


NO 


18.6 


6.8 


12.2 


6.2 


12.9 


5.9 


2 


office 


YES (5) 


25.3 


7.4 


14.2 


6.9 


419.4 


53.1 


3 


office 


NO 


26.1 


9.3 


21.8 


8.8 


231.6 


28.0 


3 


office 


YES (6) 


30.3 


10.4 


19.2 


9.2 


1844.2 


194.7 


4 


office 


NO 


35.6 


11.6 


34.9 


11.4 


321.3 


34.6 


4 


office 


YES (9) 


43.4 


13.3 


33.5 


11.5 


4632.1 


244.7 



objects, such as desks, chairs, computer towers and monitors, 
we deploy 32 sensors to cover a 67m 2 area (27 sensors along 
the perimeter, 5 in internal positions). The nodes along the 
perimeter are placed on podiums at a height of one meter from 
the floor, while the other five are positioned on desks. They 
transmit on 5 different channels, i.e. c G {11, 15, 18,22,26}. 
In this environment, we conduct several tests with two, three, 
and four people simultaneously walking at constant speed 
along a pre-defined path. At first, people enter the area one 
after the other and walk along the path in the same direction, 
exiting the area again one after the other. Later, people enter 
the area again one after the other, but this time they walk in 
opposite direction, so that their trajectories intersect multiple 
times. 



V. Results 



A. Evaluation metrics 



1) Cardinality error: The cardinality error e c is calculated 
as the fraction of frames during the test in which the number 
of people moving in the monitored area and the estimated 
number of confirmed tracks differ. 

2) Tracking accuracy: To evaluate the tracking accuracy, 
we use the optimal mass transfer (OMAT) metric, introduced 
in l42l and denned as: 



1 \r\ \z\ 

e (T,Z) = — mml^, 



(25) 



in which T is the set of all the possible permutations v 
between the set of estimated targets T and the set of real 
targets Z, and q is the order of the metric (we set q = 2). Thus, 
the OMAT error eo is equal to the root mean square error 
(RMSE) of the best possible association between estimated 
and real targets. 

3) OSPA metric: Differently than the OMAT metric defined 
in ( [25] ), the optimal subpattern assignment (OSPA) metric l43l 
includes an additional term, i.e., a constant g measured in 
meters, which penalizes the cardinality error. When T is the 
set of estimated targets, Z the set of real targets, and |T| < 
\Z\, the OSPA metric e { f(T,Z) can be calculated as: 

e^(T,Z) = 



1 

— — mm 

Z ver 



\r\ 



J2d {9) (U,z v{l) ) q +g«(\Z\-\T\) 



(26) 



where (t,z) = min{<i (£, z) , g} and d is the Euclidean 
distance between the estimated and real position of a target. 
We set q = 2. When \T\ > \Z \ % the OSPA metric is calculated 



as 6p\z, T), i.e., as in i26h but inverting T and Z in it. 



4) Tracking consistency: Since in some of the tests the 
trajectories of the targets intersect, we define a metric to 
evaluate the capability of the DFL system to avoid losing 
track of targets during and after their intersection. To this 
purpose, we use the 95%-percentile Q95 of the OMAT errors 
of the estimated tracks. A low Q95 indicates that even with 
intersecting trajectories and noisy RTI images the system does 
not lose track of the targets. 

B. Experimental results 

1) Processing time: The maximum, max[T p ], and aver- 
age, E[T p ], processing time of the multiple target tracking 
algorithms are measured on a laptop having a 2.50 GHz 
Intel® Core™ i5-2450P processor and 8.0 GB of RAM 
lists max [T p ] and 



memory. Table [III| lists max [T p ] and E[T p ] for the two 
considered nearest neighbor approaches, GNN and SNN. For 
comparison, the table includes also max [T p ] and E[T P ] for 
the multiple hypothesis tracking (MHT) method |30], l32l . 
Both the methods are well below the limit set by the length 
of an entire TDMA communication cycle on one channel 
(see Section |IV-A| ), which is approximately 100 ms for a 
network composed of about 30 nodes. The results show that 
the SNN method is faster than the GNN method. On the 
other hand, the MHT method does not fulfill the real-time 
requirement, especially when the trajectories of the targets 
intersect. In this situation, the MHT method starts creating 
a tree of hypothesis for the possible paths followed by the 
targets based on the available observations, postponing the 
decision till following RTI images which are more easy to 
be interpreted. Moreover, the processing time of the MHT 
method increases considerably with the number of targets to be 
tracked due to the increased size of the hypothesis tree. Both 
max [T p ] and E[T P ] increase with the number of targets to be 
tracked also for the GNN and SNN methods, due to the higher 
number of voxels going through the clustering process and the 
higher number of observations that have to be associated to the 
existing tracks. However, both methods cope much better than 
MHT with the increased complexity of the tracking problem. 

2) Cardinality error: The cardinality error (e c ) and the 
accuracy (eo) and consistency (Q95) of the tracking obtained 




(a) 2 targets - open environment 




1000 1100 

Frames 



(b) 3 targets - office environment 




(c) 4 targets - office environment 

Fig. 3. In (a)-(c), the comparison between the true numer of targets in the 
monitored area and the number of targets estimated by the DFL system during 
the tests carried out in the office environment with targets having separated 
trajectories. 



by applying the methods presented in this paper are listed in 



Table IV The cardinality error measured in all the experiments 
is limited only to a few frames before the actual entrance of the 
targets in the monitored area and a few frames after their actual 
exit, as shown in Figure [3] This depends on the confirmation 
and deletion rules discussed in Section IIII-D1I 

During the experiments we observed that the delay in 
detecting the entrance of a target in the monitored area due 
to the confirmation rule can sometimes be compensated by 
the use of multiple frequency channels. Deep-fade links show 
variation of the RSS even when the person is at some position 
far away from the link line. When a new target approaches 
the entrance region, the RTI images start showing a new blob. 
If this blob receives multiple successive confirmations, the 
presence of a new target is detected with a small anticipation 
with respect to its actual entrance in the monitored area. This 



Fig. 4. The real (solid lines) and estimated (dashed lines) coordinates of the 
targets during a test carried out in the office environment. Despite multiple 
intersections, cq = 0.44 m and Q95 = 0.99 m. 



effect can be observed in Figures [3Ja) and(3jb) at the entrance 
of the first target. 

3) Tracking accuracy and consistency: For what concerns 
the tracking accuracy, the largest average OMAT error, eo = 
0.55 m, is measured when four targets with intersecting 
trajectories are tracked in the office environment, where mul- 
tipath propagation of the radio signals is predominant. On the 
other hand, the largest OMAT errors measured in the open 
environment are eo = 0.33 m and eo = 0.30 m, respectively. 

In the apartment environment, the difference in the eo 
and Q95 values measured in the two tests carried out is due 
to the paths covered by the two people, which were more 
separated in the second experiment than in the first one. In 
fact, the spots where the two tracks intersect have a huge 
impact on the consistency of the tracking accuracy. Due to 
the particular positioning of the sensors in an environment 
with walls, various objects, and furniture, certain areas are 
covered by a higher number of anti-fade links (which favor 
the localization effort), whereas in some others the number 
of deep-fade links (less reliable for localization) is larger (51 . 
For this reason, the amount of noise found in the RTI images 
formed when the two trajectories intersect is different in each 
test, and this explains the slower convergence of the estimated 
tracks to the real ones after the intersections. Despite this, 
the average OMAT error measured during the two tests is 
approximately 0.30 m, both with the GNN and SNN methods. 

In the office environment, the average OMAT error mea- 
sured with two targets is slightly higher than in the other 
two environments, in both the situations of separated (eo = 
0.38 m) and intersecting (eo = 0.45 m) trajectories. The 
difference is due to the fact that this environment is the 
most challenging for DFL due to the presence of multiple 
desks, chairs, computer towers and monitors. Nevertheless, 
the largest eo and Q95 measured with 4 targets are 0.45 m 
and 0.96 m with separated trajectories and 0.55 m and 1.26 
m with intersecting trajectories. These results show that our 
DFL system is capable of accurately and consistently tracking 
up to 4 targets in a challenging indoor environment where 
multipath propagation is predominant. Figure [4] shows the real 
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TABLE IV 

e c , 6 AND Q95 FOR THE GNN AND SNN METHODS 





GNN method 


SNN method 


Targets 


Environment 


Intersections (#) 




eo [m] 


Q95 [m] 


eo [m] 


Q95 [m] 


2 


open 


NO 


u.uul 




n 71 

yj. 1 1 




n 79 


2 


open 


YES (9) 


0.001 


0.32 


0.74 


0.33 


0.77 


2 


apartment 


YES (4) 


0.001 


0.30 


0.71 


0.30 


0.75 


2 


apartment 


YES (5) 


0.002 


0.26 


0.68 


0.27 


0.71 


2 


office 


NO 


0.029 


0.38 


0.75 


0.38 


0.76 


2 


office 


YES (5) 


0.023 


0.44 


0.96 


0.45 


1.01 


3 


office 


NO 


0.016 


0.37 


0.66 


0.37 


0.67 


3 


office 


YES (6) 


0.032 


0.44 


1.05 


0.46 


1.07 


4 


office 


NO 


0.027 


0.44 


0.94 


0.45 


0.96 


4 


office 


YES (9) 


0.029 


0.54 


1.16 


0.55 


1.26 



100- 

80 
60 

> 

40 
20 




—GNN method - 2 targets 
-■ SNN method - 2 targets 
—GNN method - 3 targets 
-■ SNN method - 3 targets 
—GNN method - 4 targets 
-■SNN method - 4 targets 
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Average OMAT error [m] 
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Fig. 5. The cumulative distribution functions of the average OMAT error 
obtained with the GNN (solid lines) and SNN (dashed lines) methods in the 
tests carried out in the office environment with targets having intersecting 
trajectories. 



TABLE V 

Average OSPA errors for the tests in the office environment 





Average OSPA error [m] 


Targets 


Intersections 


9 = 1 


9 = 2.5 


g = 5 


2 


no 


0.41 


0.45 


0.51 


2 


yes 


0.46 


0.51 


0.56 


3 


no 


0.39 


0.41 


0.43 


3 


yes 


0.48 


0.59 


0.75 


4 


no 


0.47 


0.52 


0.59 


4 


yes 


0.57 


0.69 


0.83 



and estimated trajectories of the test conducted in the office 
environment with three targets having intersecting trajectories. 
The cumulative distribution functions (CDFs) of eo obtained 
with the GNN and SNN methods in the tests carried out in 
the office environment with intersecting trajectories are shown 
in Figure [5] Since the GNN method finds the optimal solution 
of the DAP at each frame, the improvement in Q95 measured 
with this method becomes more consistent with more noisy 
images, i.e. when four intersecting targets are tracked. 

Table [V] lists the average OSPA errors for the tests carried 
out in the office environment for different values of the 
cardinality penalty g. Despite the fact that no assumption is 
made on the number of targets found in the monitored area, 
due to the very small fraction of frames in which |T| 7^ \Z\, 
the average OSPA errors increase by a very small amount even 
when the cardinality penalty is set to a very high value (g = 5). 

4) Sensitivity analysis: Figure [6] shows eo, Q95, max[T p ], 
and E[T P ] measured with the GNN method for different values 
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Fig. 6. eo, Q95, max [T p ], and E[T P ] measured with the GNN method for 
different values of T c for the test carried out in the office environment with 
4 targets having intersecting trajectories. The solid line in the image at the 
bottom represents the real-time limit for the processing time. 



of the clustering threshold T c (see Section III-B ) for the test 
carried out in the office environment with 4 targets having 
intersecting trajectories. T c determines the average size of 
the formed clusters, and ultimately their number. The results 
show that the tracking accuracy of the system is consistent 
for different values of T c , with eo taking values from 0.54 
m to 0.74 m. Related to the processing time, when T c takes 
small values the system is not capable of processing the RTI 
images in real-time. This is due to the large number of formed 
clusters, which increases the time required for solving the data 
association problem (see Section III-D2 ). Both max [T p ] and 
E[T P ] decrease when T c takes larger values. Similar results 
were obtained while considering all the other tests. 

VI. Related Work 

In this section, we review the previous works related to 
multiple target tracking with RF sensor networks. Table VI 
summarizes the characteristics of these systems. In fT5ll . the 
authors present a clustering algorithm to form clusters of those 
links whose RSS is affected by the same object. In this work, 
the nodes are positioned on the ceiling of an open indoor 
environment. The RSS measurements of each identified cluster 
are then processed separately to localize the targets. The results 
show that this approach achieves a 1.08 m RMSE with two 
targets. However, to achieve these results, the probabilistic 
cover algorithm introduced in the paper requires that the two 
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TABLE VI 
Comparison table 



Work 


Tartrate 












Zhang et al. 
2009 


2 


no 


Uncluttered indoor 


1.08 


2.00 


Targets at least 5 m apart. 


Zhang et al. 
2011 


2 


no 


Uncluttered indoor 


0.98 


0.26 


Targets at least 2 m apart. 


Wilson and Patwari 
2011 


2 


no 


Bookstore 
Through wall 


0.84 
1.10 


Not in real-time 


T fixed and known a priori 


Thouin et al. 
2011 


4 


no 


Uncluttered outdoor 


0.63 
0.77 
0.96 


7.6 
1.7 

0.048 (Real-time) 


\T\ fixed and known a priori. 


Nannuru et al. 
2012 


3 


no 


Uncluttered indoor 
Office 


0.80 


Real-time if 

N p < 100 


\T\ fixed and known a priori. 
Frequent e c with varying Nf. 


Bocca et al. 
2012 


4 


yes 


Uncluttered indoor 
Apartment 
Office 


0.55 


Real-time 





targets are separated by at least 5 m. Moreover, the latency of 
the system is approximately 2 s. In fT6l , the monitored area 
is divided into different sections, each of which is covered by 
three nodes communicating on a different frequency channel 
and positioned on the ceiling as to form a triangle. A support 
vector regression model is applied to locate the targets. The 
results show that the system can detect and locate two targets 
when these are positioned in different triangles, i.e. at least 2 
m apart, with a 0.98 m RMSE. With this system, the latency 
is reduced to 0.26 s. 

The work in lfl~3l presents the results of tests carried out in 
a cluttered book store and in a through-wall scenario with 2 
targets having separated trajectories. The authors use a particle 
filter l28l to track the two targets, measuring an RMSE of 0.84 
m in the book store and of 1.1 m in the through- wall scenario. 
The number of targets is assumed to be known a priori. Due 
to the computational complexity of the particle filter method, 
localization and tracking are not performed in real-time. 

In [Q20, the authors introduce a novel measurement model 
that assumes that the attenuation in RSS due to the si- 
multaneous presence of multiple targets on the link line is 
approximately equal to the sum of the attenuations due by 
the single targets. This additive model is then applied in 
ifTHTl and ifTTl . The tests reported in ITSl are conducted in an 
outdoor uncluttered environment with up to four targets having 
separated trajectories. During each test, the number of targets 
is assumed to be fixed and known a priori. For tracking, three 
types of PF algorithms are used. The initial set of particles 
for each target is drawn from a Gaussian distribution centered 
around the real target location. The results show the existing 
trade-off between the tracking accuracy (which depends on 
the number of particles used per target) and the processing 
time (which depends on the computational complexity of the 
tracking algorithms). With four targets, the most accurate 
tracking algorithm achieves 0.63 m average error with 500 
particles per target, requiring in average 7.6 s per time step. 
When 50 particles per target are used, the most accurate 
tracking algorithm achieves 0.77 m error, requiring 1.7 s per 
time step. On the other hand, the fastest algorithm of the three 
taken into consideration achieves 0.96 m error in 48 ms per 



time step. In comparison, our system achieves lower average 
error (0.55 m) requiring on average 13.3 ms per time step. 

In IfTTl . the tests are carried out in an open indoor envi- 
ronment and in an office environment with up to three targets 
having separated trajectories. The authors use different particle 
filters to track the targets. The filters perform better in average 
when the particles are initialized according to a Gaussian 
distribution centered at the true targets locations, which in this 
case are assumed to be known a priori. The performance of 
the filters decays when the particles are initialized according 
to a uniform distribution within the observation region. When 
the number of targets is assumed to be fixed and known a 
priori, the most accurate tracking algorithm achieves a RMSE 
of 0.30 m, 0.72 m, and 0.80 m with one, two, and three targets, 
respectively. However, when the number of targets is varying, 
the applied algorithms are prone to mis-estimate the number 
of targets in several time steps. In the tests with two targets, 
this increases the average OSPA error to 1.35 m when the 
cardinality penalty g = 5. In comparison, our system achieves 
an average error of 0.45 m, 0.46 m, and 0.55 m with two, 
three, and four targets, and a maximum OSPA error of 0.83 
with four targets when g = 5. 

VII. Conclusion 

This paper presents a RF sensor network capable of tracking 
in real-time multiple targets simultaneously moving in an 
area where low-power wireless transceivers are deployed. The 
system does not require people to be tracked to participate in 
the localization task by carrying any radio device or RFID 
tag. Instead, the RSS measurements collected on the links 
forming the mesh network on multiple frequency channels are 
processed to form RTI images, i.e., images of the change in 
the propagation field due to the presence of people in the area. 
Using machine vision methods adapted to RTI, we process 
the RTI images in real-time to detect and track the blobs 
corresponding to real targets. In this work, we address the 
situation in which the targets have intersecting trajectories. 
Moreover, we apply computationally light-weight methods that 
can be executed in real-time. Furthermore, during the tests the 
number of targets is varying and is not assumed to be known 
a priori. 
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We conduct experiments in three different indoor environ- 
ments, i.e. an open environment with no obstructions nor 
objects, a one-bedroom apartment with internal walls, furniture 
and various objects, and a heavily cluttered office environment. 
In the tests, up to four targets with intersecting trajectories 
are tracked. The results show that our DFL system is able 
to correctly estimate the number of targets found in the 
monitored area and to accurately track them in real-time in 
all three environments, also when their trajectories intersect. 
The measured tracking RMSE ranges from 0.27 m (achieved in 
the apartment environment with two targets having intersecting 
trajectories) to 0.52 m (achieved in the office environment with 
four targets having intersecting trajectories). 

In future work, we will address other challenging tracking 
situations, such as people entering the monitored area side- 
by-side and then splitting and moving in different directions. 
In addition, whenever two (or more) targets converge and 
then separate again, the system has a high probability of not 
keeping the correct track-to-blob association. To overcome this 
limitation, we will equip the targets with an active badge and 
use the AGAPE method presented in l44l to maintain the 
correct track-to-blob association at all times. 
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